Prosper loan Data Exploration

by Adebajo Ruth Oluwadamilola

Table of Content

Introduction

This project is to explores a dataset, Loan Data from Prosper, provided by Udacity. It was downloaded using the link; https://www.google.com/url?q=https://s3.amazonaws.com/udacity-hosted-downloads/ud651/prosperLoanData.csv&sa=D&ust=1581581520570000 . This dataset contains 113,937 loans with 81 variables on each loan, including loan amount, borrower rate (or interest rate), current loan status, borrower income, and many others.

Preliminary Wrangling

Importing Python libraries

Data Gathering

Data Assessing

I will assess the data visually and programmatically for quality and tidiness issue.

Visual Assessment

Programatical Assessment

Programmatically assess the loan data

Quality Issues

Tidiness Issues

Data Cleaning

Define

Code

Test

Define

Code

Test

Define

Code

Define

Code

Test

Define

Code

Test

Define

Code

Test

Define

Code

Test

Define

Code

Test

Define

Code

Test

Define

Code

Test

Storing Data

Store the cleaned Loan data as a csv file

What is the structure of this dataset?

This data set contains 113,937 loans with 81 variables on each loan, including loan amount, borrower rate (or interest rate), current loan status, borrower income, and many others.

What are the main features of interest in the dataset?

I am interested in figuring out what features gives indepth insights related to the borrowers and the loans they took.

What features in the dataset will help support investigation into your features of interest?

Listed Below are the features i will like to check.

Exploratory Data Analysis

Univarent Exploration

Investigating distributions of individual variables

Discuss the distribution(s) of your variable(s) of interest. Were there any unusual points? Did you need to perform any transformations?

Of the features you investigated, were there any unusual distributions? Did you perform any operations on the data to tidy, adjust, or change the form of the data? If so, why did you do this?

Bivariate Exploration

Investigating the relationships between pairs of variables of interest in the data.

Correlation Analysis

Talk about some of the relationships you observed in this part of the investigation. How did the feature(s) of interest vary with other features in the dataset?

Did you observe any interesting relationships between the other features (not the main feature(s) of interest)?

Multivariate Exploration

Talk about some of the relationships you observed in this part of the investigation. Were there features that strengthened each other in terms of looking at your feature(s) of interest?

Were there any interesting or surprising interactions between features?

Conclusions

Exploring the distribution of univariables of interest, one of the insightful information discovered was that CA State has the highest borrowers. Looking at Borrower's employment status, I discovered majority of the borrowers have either employed or fulltime employment status. Investigating further into their source of income, I discovered that their income mostly ranges from 25,000-74,999 and their monthly income distribution is skewed to the right and they are usually less than 30k. Their income ratio is right skewed as well. Further into Bivariance exploration,to observe the relationships between 2 variables each of the data, I discovered that there was a high correlation between 'Recommendations'and 'friends that invest' with Little to no correlation between LenderYield and PercentFunded Lastly on Multivariant exploration,

Limitations